Cross-Language Information Retrieval Based on Category Matching of Web Directories

نویسندگان

  • Fuminori Kimura
  • Akira Maeda
  • Masatoshi Yoshikawa
  • Shunsuke Uemura
چکیده

With the popularity of the Internet, more and more languages are becoming to be used for Web documents. Accordingly, Cross-Language Information Retrieval (CLIR), a method to retrieve documents written in one or more languages using a query written in another language, has been actively studied. A variety of methods, including employing corpus statistics for translation of terms and disambiguation of translated terms, are studied and a certain results has been obtained. However, since corpus-based methods depend much on the domain of the content, such methods have a potential problem that the retrieval effectiveness might be poor for domains which do not match the content of the corpus. In this paper, we propose a method to employ a Web directory which has multiple language versions such as Yahoo! for CLIR of Web documents. Feature terms are extracted from Web documents in a category, and one or more correspondent categories are determined by comparing similarities of categories across languages. We intend to resolve ambiguities of dictionary translation and to improve the retrieval effectiveness by limiting the categories to be retrieved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Appropriate Category Level of Web Directory for Cross-Language Information Retrieval

In this paper, we analyzed appropriate category level of Web directory for Cross-Language Information Retrieval (CLIR). Our proposed method for CLIR is based on estimating domains of the query using hierarchic structures of Web directories. Therefore, it is necessary for correct domain estimation to detect appropriate category level of Web directory. We conducted experiments of retrieval using ...

متن کامل

Cross-Language Information Retrieval based on category matching between language versions of a web directory

Since the Web consists of documents in various domains or genres, the method for Cross-Language Information Retrieval (CLIR) of Web documents should be independent of a particular domain. In this paper, we propose a CLIR method which employs a Web directory provided in multiple language versions (such as Yahoo!). In the proposed method, feature terms are first extracted from Web documents for e...

متن کامل

Impact of Controlled and Free Language Use in Retrieving Articles from the ProQuest and Science Direct Databases

Abstract Introduction: The growth and expansion of the Internet has changed the way information is accessed and many facilities have been created on the Web to facilitate and expedite information locating. Objective: To identify the impact of keyword documentation using the medical thesaurus on the retrieval of articles from Proquest and Science Direct databases. Materials and Methods:The pr...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002